RSA (algorithm)

RSA
General
Designers Ron Rivest, Adi Shamir, and Leonard Adleman
First published 1978
Certification PKCS#1, ANSI X9.31, IEEE 1363
Cipher detail
Key sizes 1,024 to 4,096 bit typical
Best public cryptanalysis
A 768 bit key has been broken

RSA is an algorithm for public-key cryptography that is based on the presumed difficulty of factoring large integers, the factoring problem. RSA stands for Ron Rivest, Adi Shamir and Leonard Adleman, who first publicly described it in 1978. A user of RSA creates and then publishes the product of two large prime numbers, along with an auxiliary value, as their public key. The prime factors must be kept secret. Anyone can use the public key to encrypt a message, but with currently published methods, if the public key is large enough, only someone with knowledge of the prime factors can feasibly decode the message.[1] Whether breaking RSA encryption is as hard as factoring is an open question known as the RSA problem.

Contents

History

Clifford Cocks, an English mathematician working for the UK intelligence agency GCHQ, described an equivalent system in an internal document in 1973, but given the relatively expensive computers needed to implement it at the time, it was mostly considered a curiosity and, as far as is publicly known, was never deployed. His discovery, however, was not revealed until 1998 due to its top-secret classification, and Rivest, Shamir, and Adleman devised RSA independently of Cocks' work.

The RSA algorithm was publicly described in 1978 by Ron Rivest, Adi Shamir, and Leonard Adleman at MIT; the letters RSA are the initials of their surnames, listed in the same order as on the paper.[2]

MIT was granted U.S. Patent 4,405,829 for a "Cryptographic communications system and method" that used the algorithm in 1983. The patent would have expired on September 21, 2000 (the term of patent was 17 years at the time), but the algorithm was released to the public domain by RSA Security on 6 September 2000, two weeks earlier.[3] Since a paper describing the algorithm had been published in August 1977,[2] prior to the December 1977 filing date of the patent application, regulations in much of the rest of the world precluded patents elsewhere and only the US patent was granted. Had Cocks' work been publicly known, a patent in the US might not have been possible.

From the DWPI's abstract of the patent,

The system includes a communications channel coupled to at least one terminal having an encoding device and to at least one terminal having a decoding device. A message-to-be-transferred is enciphered to ciphertext at the encoding terminal by encoding the message as a number M in a predetermined set. That number is then raised to a first predetermined power (associated with the intended receiver) and finally computed. The remainder or residue, C, is... computed when the exponentiated number is divided by the product of two predetermined prime numbers (associated with the intended receiver).

Operation

The RSA algorithm involves three steps: key generation, encryption and decryption.

Key generation

RSA involves a public key and a private key. The public key can be known to everyone and is used for encrypting messages. Messages encrypted with the public key can only be decrypted using the private key. The keys for the RSA algorithm are generated the following way:

  1. Choose two distinct prime numbers p and q.
    • For security purposes, the integers p and q should be chosen at random, and should be of similar bit-length. Prime integers can be efficiently found using a primality test.
  2. Compute n = pq.
    • n is used as the modulus for both the public and private keys
  3. Compute φ(n) = (p – 1)(q – 1), where φ is Euler's totient function.
  4. Choose an integer e such that 1 < e < φ(n) and greatest common denominator of (e,φ(n)) = 1, i.e. e and φ(n) are coprime.
    • e is released as the public key exponent.
    • e having a short bit-length and small Hamming weight results in more efficient encryption - most commonly 0x10001 = 65537. However, small values of e (such as 3) have been shown to be less secure in some settings.[4]
  5. Determine d = e–1 mod φ(n); i.e. d is the multiplicative inverse of e mod φ(n).
    • This is more clearly stated as solve for d given (d*e)mod φ(n) = 1
    • This is often computed using the extended Euclidean algorithm.
    • d is kept as the private key exponent.

The public key consists of the modulus n and the public (or encryption) exponent e. The private key consists of the modulus n and the private (or decryption) exponent d which must be kept secret.

Notes:

Encryption

Alice transmits her public key (n,e) to Bob and keeps the private key secret. Bob then wishes to send message M to Alice.

He first turns M into an integer m, such that 0 < m < n by using an agreed-upon reversible protocol known as a padding scheme. He then computes the ciphertext c corresponding to

 c = m^e\text{ (mod }n\text{)}.

This can be done quickly using the method of exponentiation by squaring. Bob then transmits c to Alice.

Note that at least nine values of m will yield a ciphertext c equal to m[5], But this is very unlikely to occur in practice.

Decryption

Alice can recover m from c by using her private key exponent d via computing

 m = c^d\text{ (mod }n\text{)}.

Given m, she can recover the original message M by reversing the padding scheme.

(In practice, there are more efficient methods of calculating c^d using the pre computed values below.)

Using the Chinese remainder algorithm

For efficiency many popular crypto libraries (like OpenSSL, Java and .NET) use the following optimization for decryption and signing: The following values are precomputed and stored as part of the private key:

These values allow the recipient to compute the exponentiation m = c^d\text{ (mod }pq\text{)} more efficiently as follows:

This is more efficient than computing m = c^d\text{ (mod }pq\text{)} even though two modular exponentiations have to be computed. The reason is that these two modular exponentiations both use a smaller exponent and a smaller modulus.

A working example

Here is an example of RSA encryption and decryption. The parameters used here are artificially small, but one can also use OpenSSL to generate and examine a real keypair.

  1. Choose two distinct prime numbers, such as
    p = 61 and q=53.
  2. Compute n = p q giving
    n = 61 · 53 = 3233.
  3. Compute the totient of the product as \phi(n) = (p-1)(q-1) giving
    \phi(3233) = (61 - 1)(53 - 1) = 3120.
  4. Choose any number 1 < e < 3120 that is coprime to 3120. Choosing a prime number for e leaves us only to check that e is not a divisor of 3120.
    Let e=17.
  5. Compute d, the modular multiplicative inverse of e\text{ (mod }\phi(n)\text{)} yielding
    d=2753.

The public key is (n=3233, e=17). For a padded plaintext message m, the encryption function is m^{17}\text{ (mod }3233\text{)}.

The private key is (n=3233, d=2753). For an encrypted ciphertext c, the decryption function is c^{2753}\text{ (mod }3233\text{)}.

For instance, in order to encrypt m=65, we calculate

c = 65^{17}\text{ (mod }3233\text{)} = 2790.

To decrypt c = 2790, we calculate

m = 2790^{2753}\text{ (mod }3233\text{)} = 65.

Both of these calculations can be computed efficiently using the square-and-multiply algorithm for modular exponentiation. In real life situations the primes selected would be much larger; in our example it would be relatively trivial to factor n, 3233, obtained from the freely available public key back to the primes p and q. Given e, also from the public key, we could then compute d and so acquire the private key.

Practical implementations use Chinese remainder theorem to speed up the calculation using modulus of factors (mod p*q using mod p and mod q).

The values dp, dq and qInv, which are part of the private key are computed as follows:

Here is how dp, dq and qInv are used for efficient decryption. (Encryption is efficient by choice of public exponent e)

Attacks against plain RSA

There are a number of attacks against plain RSA as described below.

Padding schemes

To avoid these problems, practical RSA implementations typically embed some form of structured, randomized padding into the value m before encrypting it. This padding ensures that m does not fall into the range of insecure plaintexts, and that a given message, once padded, will encrypt to one of a large number of different possible ciphertexts.

Standards such as PKCS#1 have been carefully designed to securely pad messages prior to RSA encryption. Because these schemes pad the plaintext m with some number of additional bits, the size of the un-padded message M must be somewhat smaller. RSA padding schemes must be carefully designed so as to prevent sophisticated attacks which may be facilitated by a predictable message structure. Early versions of the PKCS#1 standard (up to version 1.5) used a construction that turned RSA into a semantically secure encryption scheme. This version was later found vulnerable to a practical adaptive chosen ciphertext attack. Later versions of the standard include Optimal Asymmetric Encryption Padding (OAEP), which prevents these attacks. The PKCS#1 standard also incorporates processing schemes designed to provide additional security for RSA signatures, e.g., the Probabilistic Signature Scheme for RSA (RSA-PSS).

Signing messages

Suppose Alice uses Bob's public key to send him an encrypted message. In the message, she can claim to be Alice but Bob has no way of verifying that the message was actually from Alice since anyone can use Bob's public key to send him encrypted messages. In order to verify the origin of a message, RSA can also be used to sign a message.

Suppose Alice wishes to send a signed message to Bob. She can use her own private key to do so. She produces a hash value of the message, raises it to the power of d\text{ mod }n (as she does when decrypting a message), and attaches it as a "signature" to the message. When Bob receives the signed message, he uses the same hash algorithm in conjunction with Alice's public key. He raises the signature to the power of e\text{ mod }n (as he does when encrypting a message), and compares the resulting hash value with the message's actual hash value. If the two agree, he knows that the author of the message was in possession of Alice's private key, and that the message has not been tampered with since.

Secure padding schemes such as RSA-PSS are as essential for the security of message signing as they are for message encryption. The same key should never be used for both encryption and signing.[8]

Security and practical considerations

Integer factorization and RSA problem

The security of the RSA cryptosystem is based on two mathematical problems: the problem of factoring large numbers and the RSA problem. Full decryption of an RSA ciphertext is thought to be infeasible on the assumption that both of these problems are hard, i.e., no efficient algorithm exists for solving them. Providing security against partial decryption may require the addition of a secure padding scheme.

The RSA problem is defined as the task of taking eth roots modulo a composite n: recovering a value m such that c\equiv m^e\text{ (mod }n\text{)}, where (n, e) is an RSA public key and c is an RSA ciphertext. Currently the most promising approach to solving the RSA problem is to factor the modulus n. With the ability to recover prime factors, an attacker can compute the secret exponent d from a public key (n, e), then decrypt c using the standard procedure. To accomplish this, an attacker factors n into p and q, and computes (p-1)(q-1) which allows the determination of d from e. No polynomial-time method for factoring large integers on a classical computer has yet been found, but it has not been proven that none exists. See integer factorization for a discussion of this problem. Rivest, Shamir and Adleman note[1] that Miller has shown that - assuming the Extended Riemann Hypothesis (though others call it the Generalized Riemann Hypothesis) - finding d from n and e is as hard as factoring n into p and q (up to a polynomial time difference).[9] However, this proof does not imply that inverting RSA is equally hard as factoring.

As of 2010, the largest (known) number factored by a general-purpose factoring algorithm was 768 bits long (see RSA-768), using a state-of-the-art distributed implementation. RSA keys are typically 1024–2048 bits long. Some experts believe that 1024-bit keys may become breakable in the near future (though this is disputed); few see any way that 4096-bit keys could be broken in the foreseeable future. Therefore, it is generally presumed that RSA is secure if n is sufficiently large. If n is 300 bits or shorter, it can be factored in a few hours on a personal computer, using software already freely available. Keys of 512 bits have been shown to be practically breakable in 1999 when RSA-155 was factored by using several hundred computers and are now factored in a few weeks using common hardware.[10] Exploits using 512-bit code-signing certificates that may have been factored were reported in 2011.[11] A theoretical hardware device named TWIRL and described by Shamir and Tromer in 2003 called into question the security of 1024 bit keys. It is currently recommended that n be at least 2048 bits long.[12]

In 1994, Peter Shor showed that a quantum computer (if one could ever be practically created for the purpose) would be able to factor in polynomial time, breaking RSA.

Key generation

Finding the large primes p and q is usually done by testing random numbers of the right size with probabilistic primality tests which quickly eliminate virtually all non-primes.

Numbers p and q should not be 'too close', lest the Fermat factorization for n be successful, if p − q, for instance is less than 2n1/4 (which for even small 1024-bit values of n is 3×1077) solving for p and q is trivial. Furthermore, if either p − 1 or q − 1 has only small prime factors, n can be factored quickly by Pollard's p − 1 algorithm, and these values of p or q should therefore be discarded as well.

It is important that the private key d be large enough. Michael J. Wiener showed[13] that if p is between q and 2q (which is quite typical) and d < n1/4/3, then d can be computed efficiently from n and e.

There is no known attack against small public exponents such as e = 3, provided that proper padding is used. However, when no padding is used, or when the padding is improperly implemented, small public exponents have a greater risk of leading to an attack, such as the unpadded plaintext vulnerability listed above. 65537 is a commonly used value for e. This value can be regarded as a compromise between avoiding potential small exponent attacks and still allowing efficient encryptions (or signature verification). The NIST Special Publication on Computer Security (SP 800-78 Rev 1 of August 2007) does not allow public exponents e smaller than 65537, but does not state a reason for this restriction.

This procedure raises additional security issues. For instance, it is of utmost importance to use a strong random number generator for the symmetric key, because otherwise Eve (an eavesdropper wanting to see what was sent) could bypass RSA by guessing the symmetric key.

Timing attacks

Kocher described a new attack on RSA in 1995: if the attacker Eve knows Alice's hardware in sufficient detail and is able to measure the decryption times for several known ciphertexts, she can deduce the decryption key d quickly. This attack can also be applied against the RSA signature scheme. In 2003, Boneh and Brumley demonstrated a more practical attack capable of recovering RSA factorizations over a network connection (e.g., from a Secure Socket Layer (SSL)-enabled webserver)[14] This attack takes advantage of information leaked by the Chinese remainder theorem optimization used by many RSA implementations.

One way to thwart these attacks is to ensure that the decryption operation takes a constant amount of time for every ciphertext. However, this approach can significantly reduce performance. Instead, most RSA implementations use an alternate technique known as cryptographic blinding. RSA blinding makes use of the multiplicative property of RSA. Instead of computing c^d\text{ (mod }n\text{)}, Alice first chooses a secret random value r and computes (r^e c)^d\text{ (mod }n\text{)}. The result of this computation after applying Euler's Theorem is r c^d\text{ (mod }n\text{)} and so the effect of r can be removed by multiplying by its inverse. A new value of r is chosen for each ciphertext. With blinding applied, the decryption time is no longer correlated to the value of the input ciphertext and so the timing attack fails.

Adaptive chosen ciphertext attacks

In 1998, Daniel Bleichenbacher described the first practical adaptive chosen ciphertext attack, against RSA-encrypted messages using the PKCS #1 v1 padding scheme (a padding scheme randomizes and adds structure to an RSA-encrypted message, so it is possible to determine whether a decrypted message is valid.) Due to flaws with the PKCS #1 scheme, Bleichenbacher was able to mount a practical attack against RSA implementations of the Secure Socket Layer protocol, and to recover session keys. As a result of this work, cryptographers now recommend the use of provably secure padding schemes such as Optimal Asymmetric Encryption Padding, and RSA Laboratories has released new versions of PKCS #1 that are not vulnerable to these attacks.

Side-channel analysis attacks

A side-channel attack using branch prediction analysis (BPA) has been described. Many processors use a branch predictor to determine whether a conditional branch in the instruction flow of a program is likely to be taken or not. Often these processors also implement simultaneous multithreading (SMT). Branch prediction analysis attacks use a spy process to discover (statistically) the private key when processed with these processors.

Simple Branch Prediction Analysis (SBPA) claims to improve BPA in a non-statistical way. In their paper, "On the Power of Simple Branch Prediction Analysis",[15] the authors of SBPA (Onur Aciicmez and Cetin Kaya Koc) claim to have discovered 508 out of 512 bits of an RSA key in 10 iterations.

A power fault attack on RSA implementations has been described in 2010.[16] The authors recovered the key by varying the CPU power voltage outside limits; this caused multiple power faults on the server.

Proofs of correctness

Proof using Fermat's Little Theorem

The proof of the correctness of RSA is based on Fermat's little theorem. This theorem states that if p is prime and p does not divide an integer a then

 a^{(p-1)} \equiv 1\text{ (mod }p\text{)}.

We want to show (me)d  \equiv m \bmod pq for every integer m when p and q are distinct prime numbers and e and d are positive integers satisfying

e d \equiv 1\text{ (mod }(p-1)(q-1)\text{)}.

We can write

e d - 1 = h(p-1)(q-1).

for some nonnegative integer h.

To check two numbers, like med and m, are congruent mod pq it suffices (and in fact is equivalent) to check they are congruent mod p and mod q separately. (This is part of the Chinese remainder theorem, although it is not the significant part of that theorem.) To show med \equiv m mod p, we consider two cases: m \equiv 0 mod p and m \not\equiv 0 mod p. In the first case med is a multiple of p, so med \equiv 0 \equiv m mod p. In the second case

m^{e d} = m^{(e d - 1)}m = m^{h(p-1)(q-1)}m = (m^{p-1})^{h(q-1)}m \equiv 1^{h(q-1)}m \equiv m \bmod p,

where we used Fermat's little theorem to replace mp-1 mod p with 1.

The verification that med \equiv m mod q proceeds in a similar way, treating separately the cases m \equiv 0 mod q and m \not\equiv 0 mod q, using Fermat's little theorem for modulus q in the second case.

This completes the proof that, for any integer m,

\left(m^e\right)^d \equiv m\text{ (mod }pq\text{)}.

Proof using Euler's Theorem

Although the original paper of Rivest, Shamir, and Adleman used Fermat's little theorem to explain why RSA works, it is common to find proofs that rely instead on Euler's theorem.

We want to show med \equiv m mod n, where n = pq is a product of two different prime numbers and e and d are positive integers satisfying ed \equiv 1 mod \varphi(n). Since e and d are positive, we can write ed = 1 %2B h\varphi(n) for some nonnegative integer h. Assuming that m is relatively prime to n, we have

m^{ed} \equiv m^{1 %2B h\varphi(n)} \equiv m (m^{\varphi(n)})^{h} \equiv m\text{ (mod }n\text{)},

where the last congruence directly follows from Euler's theorem. When m is not relatively prime to n, the argument just given is invalid. However, the desired congruence is still true. Either m \equiv 0 mod p or m \equiv 0 mod q, and these cases can be treated using the previous proof.

Numerous references which explain RSA using Euler's theorem deal with the case that the message m is not relatively prime to the modulus pq by a misleading probabilistic argument: the proportion of integers mod pq that have a factor in common with the modulus is 1 - (p-1)(q-1)/pq = 1/p + 1/q - 1/pq, which is very small when p and q are large so the chance of the message having a factor in common with the modulus can be considered remote in practice. What is misleading here is that, as the proof with Fermat's little theorem shows, nothing breaks down in the case of messages having a factor in common with the modulus: one has med \equiv m mod n for all m without exceptions. Therefore the correctness of RSA should really be considered an application of Fermat's little theorem rather than Euler's theorem, just as in the original RSA paper.

See also

Notes

  1. ^ a b Rivest, R.; A. Shamir; L. Adleman (1978). "A Method for Obtaining Digital Signatures and Public-Key Cryptosystems". Communications of the ACM 21 (2): 120–126. doi:10.1145/359340.359342. http://theory.lcs.mit.edu/~rivest/rsapaper.pdf. 
  2. ^ a b SIAM News, Volume 36, Number 5, June 2003, "Still Guarding Secrets after Years of Attacks, RSA Earns Accolades for its Founders", by Sara Robinson
  3. ^ http://www.rsa.com/press_release.aspx?id=261
  4. ^ Boneh, Dan (1999). "Twenty Years of attacks on the RSA Cryptosystem". Notices of the American Mathematical Society (AMS) 46 (2): 203–213. http://crypto.stanford.edu/~dabo/abstracts/RSAattack-survey.html. 
  5. ^ Namely, the values of m which are equal to -1, 0, or 1 modulo p while also equal to -1, 0, or 1 modulo q. There will be more values of m having c=m if p-1 or q-1 has other divisors in common with e-1 besides 2 because this gives more values of m such that m^{e-1}\text{ (mod }p\text{)}=1 or m^{e-1}\text{ (mod }q\text{)}=1 respectively.
  6. ^ Johan Håstad, "On using RSA with Low Exponent in a Public Key Network", Crypto 85
  7. ^ Don Coppersmith, "Small Solutions to Polynomial Equations, and Low Exponent RSA Vulnerabilities", Journal of Cryptology, v. 10, n. 4, Dec. 1997
  8. ^ http://www.di-mgt.com.au/rsa_alg.html#weaknesses
  9. ^ Gary L. Miller, "Riemann's Hypothesis and Tests for Primality"
  10. ^ 518-bit GNFS with msieve
  11. ^ RSA-512 certificates abused in-the-wild
  12. ^ Has the RSA algorithm been compromised as a result of Bernstein's Paper? What key size should I be using?
  13. ^ Wiener, Michael J. (May 1990). "Cryptanalysis of short RSA secret exponents". Information Theory, IEEE Transactions on 36 (3): 553–558. doi:10.1109/18.54902. 
  14. ^ Remote timing attacks are practical. . SSYM'03 Proceedings of the 12th conference on USENIX Security Symposium.
  15. ^ http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.80.1438&rep=rep1&type=pdf
  16. ^ FaultBased Attack of RSA Authentication

References

External links